Towards Fusion of Textual and Visual Modalities for Describing Audiovisual Documents
نویسندگان
چکیده
Audiovisual documents provide a wide range of content description through more descriptors from different media types. Indeed, the extraction of these descriptions has received an increasing attention. But, the lack of semantic description always persists. In fact, this lack affects the retrieval process. To address this problem, this paper describes an automatic and semantic description of cinematic audiovisual documents. This description is based not only on the audiovisual flux in this post-production phase but also in the documentation in the pre-production phase by using textual and visual modalities. In this context, to extract content description, we find it is essential to extract texts superposed in the image. This process is mainly based on the neural network classifier. Moreover, an effective OCR (Tesseract) is adapted for texts recognition. Experiments results confirmed the interesting performance through two databases, namely, “ICDAR 2011” and our own created database from the Internet Movie Database Imdb. Towards Fusion of Textual and Visual Modalities for Describing Audiovisual Documents
منابع مشابه
A Comparative Analysis of the Effect of Visual and Textual Information on the Health Information Perception of High School Girl Students in Tehran
Purpose: Information and information sources can be divided into three broad categories according to their nature or type: textual information (book, journal article, conference paper, dissertation, newspaper, etc.), visual information (infographic, photo, Cartoons, films, etc.) and audiovisual information. The purpose of this study is to determine the effect of reading textual information in c...
متن کاملApplication of Topic Segmentation in Audiovisual Information Retrieval
Segmentation into topically coherent segments is one of the crucial points in information retrieval (IR). Suitable segmentation may improve the results of IR system and help users to find relevant passages faster. Segmentation is especially important in audiovisual recordings, in which the navigation is difficult. We present several methods used for topic segmentation, based on textual, audio a...
متن کاملTEXTUAL AND INTER-TEXTUAL ANALYSES OF IRANIAN EFL UNDERGRADUATES’ TYPES OF ENGLISH READING TOWARDS DEVELOPING A CAREFUL READING FRAMEWORK
This study investigated textual and inter-textual reading of a group of Iranian EFL undergraduates’ careful English reading types. In this research, Khalifa and Weir’s (2009) reading framework was used to propose a more inclusive aspect of a careful reading framework and the reading construct for instructional and assessment goals. The participants of this study were B.A. students of English Tr...
متن کاملAudiovisual temporal fusion in 6-month-old infants
The aim of this study was to investigate neural dynamics of audiovisual temporal fusion processes in 6-month-old infants using event-related brain potentials (ERPs). In a habituation-test paradigm, infants did not show any behavioral signs of discrimination of an audiovisual asynchrony of 200 ms, indicating perceptual fusion. In a subsequent EEG experiment, audiovisual synchronous stimuli and s...
متن کاملEnd-to-End Audiovisual Fusion with LSTMs
Several end-to-end deep learning approaches have been recently presented which simultaneously extract visual features from the input images and perform visual speech classification. However, research on jointly extracting audio and visual features and performing classification is very limited. In this work, we present an end-to-end audiovisual model based on Bidirectional Long Short-Term Memory...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJMDEM
دوره 6 شماره
صفحات -
تاریخ انتشار 2015